Reporting Exact and Approximate Regular Expression Matches
نویسندگان
چکیده
While much work has been done on determining if a document or a line of a document contains an exact or approximate match to a regular expression, less e ort has been expended in formulating and determining what to report as \the match" once such a \hit" is detected. For exact regular expression pattern matching, we give algorithms for nding a longest match, all symbols involved in some match, and nding optimal submatches to tagged parts of a pattern. For approximate regular expression matching, we develop notions of what constitutes a signi cant match, give algorithms for them, and also for nding a longest match and all symbols in a match.
منابع مشابه
Approximate Regular Expression Searching with Arbitrary Integer Weights
We present a bit-parallel technique to search a text of length n for a regular expression of m symbols permitting k differences in worst case time O(mn/ logk s), where s is the amount of main memory that can be allocated. The algorithm permits arbitrary integer weights and matches the complexity of the best previous techniques, but it is simpler and faster in practice. In our way, we define a n...
متن کاملProMiner: Organism-specific protein name detection using approximate string matching
were required, (2) disambiguation failed because of missing synonyms, e.g. ”vertebrate” and (3) for several cases the provided gold standard might be incorrect as considered abstracts describe findings in rat or human instead of mouse. Description Examples Unspecific synonym growth retarded, perinatal lethality, long lived Wrong context TGF-beta superfamily, c-myc tumors Unknown ambiguity high ...
متن کاملDerivatives of Approximate Regular Expressions
Our aim is to construct a finite automaton recognizing the set of words that are at a bounded distance from some word of a given regular language. We define new regular operators, the similarity operators, based on a generalization of the notion of distance and we introduce the family of regular expressions extended to similarity operators, that we call AREs (Approximate Regular Expressions). W...
متن کاملComputing Semantic Similarity between Skill Statements for Approximate Matching
This paper explores the problem of computing text similarity between verb phrases describing skilled human behavior for the purpose of finding approximate matches. Four parsers are evaluated on a large corpus of skill statements extracted from an enterprise-wide expertise taxonomy. A similarity measure utilizing common semantic role features extracted from parse trees was found superior to an i...
متن کاملREAFUM: Representative Approximate Frequent Subgraph Mining
Noisy graph data and pattern variations are two thorny problems faced by mining frequent subgraphs. Traditional exact-matching based methods, however, only generate patterns that have enough perfect matches in the graph database. As a result, a pattern may either remain undetected or be reported as multiple (almost identical) patterns if it manifests slightly different instances in different gr...
متن کامل